home *** CD-ROM | disk | FTP | other *** search
- The Use of UNIX Regular Expressions
-
- Regular Expressions are a standard function of the UNIX operating system.
- They are used to search for or replace text located within files or from the
- keyboard. The UNIX commands GREP (Global Regular Expression Print) and
- SUBS (SUBstitue Strings) are used to manipulate text via regular expressions.
-
- There are various characters that the reuglar expression processor sees as
- having special meaning. These special characters are:
-
- . period matches any alpha-numeric character
-
- ^ caret matches the beginning of the line
- grep ^#include
- will find all line beginning with #include
-
- $ dollar matches the end of the line
- grep }$
- will find all lines ending with a close brace
-
- * asterisk Repeats any number of the last pattern. Includes
- zero repetitions.
- grep test*
- finds lines with at least one occurence of test.
-
- [] brakets specifies a set of given characters.
- grep ^[abcd0123]
- will find lines begining with any one of the
- specified characters.
-
- - hyphen denotes a range of characters.
- grep [A-Z0-9]$
- finds lines ending in a capital letter or a number.
-
- \ backslash removes the special characteristics of a character.
- grep \$
- finds dollar signs in a line of text.
-
- NOTE: When the caret (^) is used as the first character after a left
- square braket, it reverses the meaning of the search.
- grep [^A-Z]&
- will find all lines not ending with a capital letter.
- grep ^[^A-Z]
- will find all lines not begining with a capital letter. Note that
- in this case the caret is used in both contexts.
-
- Pattern Segments
-
- One of the powerful functions of the SUBS command is the ability to
- rearrange the internal structure of text lines. This is done by the use of
- pattern segments. While searching through the lines of text, certain patterns
- may be given identification numbers. The command can then selectively delete
- or rearrange the numbered segments.
-
- \( denotes the beginning of a segment.
- \) denotes the end of a segment.
- \#\ denotes the segement in the second part of the SUBS command.
-
- The best way to understand this usage is by example.
-
- Say that you need to prepare a list of all 'C' source code but you want
- to replace the '.C' extent with '.OBJ' the following SUBS command will
- do just that.
-
- SUBS "\(.*\)\.c" "\1\.OBJ"
-
- In this command the use of the '.*' within the first segment is a powerful
- construction. It is the same as the DOS '*' wild card. The period followed by
- the asterisk means repeat any character any number of times. The name of the
- file is the first part of the search. The \1\ calls for the replacement of
- only the file name followed by the .OBJ .
-
-